Operating systems introduce two fundamental abstractions: files and processes. Binary (executable) files can be viewed as a static abstraction of resources while processes, can be viewed as a dynamic representation of resources. The process of transforming the static entity (binary executable files) in a dynamic entity (process) is called loading. The loader, which is a piece of code that is part of the operating system, has to read the binary executable file, allocate resources (e.g. memory), create OS data structures that represent a live proces, and, ultimatelly set the instruction pointer to the very first instruction of the program.
For this, the loader requires information such as the process' memory layout and the adddress of the first instruction. All this (meta-)information resides in the executable format, that the loader has to understand somehow – hence, each loadable exeutable binary has a specific format.
During this lab we will focus on the static view: executable files and basic methods for analyzing them without being required to run the program.
Sun Microsystems' SunOS came up with the concept of dynamic shared libraries and introduced it to UNIX in the late 1980s. UNIX System V Release 4, which Sun co-developed, introduced the ELF object format adaptation from the Sun scheme. Later it was developed and published as part of the ABI (Application binary interface) as an improvement over COFF, the previous object format and by the late 1990s it had become the standard for UNIX and UNIX-like systems including Linux and BSD derivatives. Depending on processor architectures several specifications have emerged with minor changes, but for this lab we will be focusing on the ELF-32 format.
As discussed above, executable files contain (in addition to the actual executable code) metadata that the loader needs in order to start a given program. Linux commonly uses the ELF format to hold at least the following program metadata:
The figure below shows how ELF sections and segments are organized: the section header table contains linking information for (static) sections, while the program header describes the run-time memory layout to the loader using segments. For example here the .text
and .rodata
sections are both part of the same (read-only) program segment.
Let's suppose we want to find out information about the 32-bit hello
program included in the lab archive. A first step would be to look at the header:
$ readelf -h hello ELF Header: Magic: 7f 45 4c 46 01 01 01 03 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - GNU ABI Version: 0 Type: EXEC (Executable file) Machine: Intel 80386 Version: 0x1 Entry point address: 0x804887f Start of program headers: 52 (bytes into file) Start of section headers: 726696 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 6 Size of section headers: 40 (bytes) Number of section headers: 31 Section header string table index: 28
We observe the following:
0x804887f
. Note that this assumes that the address will contain code after the program is loaded.52
in the file.726696
in the file.Looking at the program sections:
$ readelf -S hello There are 31 section headers, starting at offset 0xb16a8: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .note.ABI-tag NOTE 080480f4 0000f4 000020 00 A 0 0 4 [ 2] .note.gnu.build-i NOTE 08048114 000114 000024 00 A 0 0 4 ... [ 6] .text PROGBITS 080482d0 0002d0 0733ac 00 AX 0 0 16 ... [10] .rodata PROGBITS 080bc1c0 0741c0 01a44c 00 A 0 0 32 ... [24] .data PROGBITS 080eb060 0a2060 000f20 00 WA 0 0 32 [25] .bss NOBITS 080ebf80 0a2f80 000e0c 00 WA 0 0 32 ... Key to Flags: W (write), A (alloc), X (execute) ...
we see that .text
, .rodata
, .data
and .bss
are all to be loaded into the program, and that .text
contains executable code, while .data
and .bss
contain writable data. The actual permissions are however determined by looking at the segments.
$ readelf -l hello Elf file type is EXEC (Executable file) Entry point 0x804887f There are 6 program headers, starting at offset 52 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x08048000 0x08048000 0xa168b 0xa168b R E 0x1000 LOAD 0x0a1f5c 0x080eaf5c 0x080eaf5c 0x01024 0x01e48 RW 0x1000 NOTE 0x0000f4 0x080480f4 0x080480f4 0x00044 0x00044 R 0x4 TLS 0x0a1f5c 0x080eaf5c 0x080eaf5c 0x00010 0x00028 R 0x4 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10 GNU_RELRO 0x0a1f5c 0x080eaf5c 0x080eaf5c 0x000a4 0x000a4 R 0x1 Section to Segment mapping: Segment Sections... 00 .note.ABI-tag .note.gnu.build-id .rel.plt .init .plt .text __libc_freeres_fn __libc_thread_freeres_fn .fini .rodata __libc_subfreeres __libc_IO_vtables __libc_atexit __libc_thread_subfreeres .eh_frame .gcc_except_table 01 .tdata .init_array .fini_array .jcr .data.rel.ro .got.plt .data .bss __libc_freeres_ptrs 02 .note.ABI-tag .note.gnu.build-id 03 .tdata .tbss 04 05 .tdata .init_array .fini_array .jcr .data.rel.ro
Our hello
executable contains six segments, the first of which aggregates read-only data and program code, while the second contains writable sections, etc.
.rodata
and .text
are both mapped as read-only and executable. This is interesting from a security perspective.
Finally, we can inspect all the symbols in the binary:
$ readelf -s hello | less Symbol table '.symtab' contains 1984 entries: Num: Value Size Type Bind Vis Ndx Name ... 1544: 08054690 87 FUNC GLOBAL DEFAULT 6 _IO_default_uflow 1545: 0805cf60 43 IFUNC GLOBAL DEFAULT 6 memset 1546: 0806d290 10 FUNC GLOBAL DEFAULT 6 __wmempcpy 1547: 0807c330 30 FUNC WEAK DEFAULT 6 __strtol_l 1548: 080489cc 46 FUNC GLOBAL DEFAULT 6 main 1549: 080a2830 1957 FUNC GLOBAL DEFAULT 6 _dl_start_profile 1550: 080ecc98 4 OBJECT GLOBAL DEFAULT 25 _dl_origin_path ...
The symbol table contains process information such as the symbol's address, as well as the symbol's type (e.g. a function, a data object) and binding information.
We remember that compilation goes through the following phases:
.c
file) written in a high-level language (C, in our case) is preprocessed and compiled into an assembly source file;Each binary file in the compilation process has an executable format attached to it. Particularly in the case of ELF, we have the following types of files:
Most types of executable files are obtained from multiple object files, either through static linking or dynamic linking. Static linking involves interpreting each piece of code from each file and then merging all the information inside a single binary that would contain all the machine code necessary for the program. This way of doing things, still in use today, involves loading all of the code and data into memory regardless of use case.
The ELF format also allows executable files to be dynamically linked. Instead of linking all the source files that contain subroutines into the final binaries, separate binaries are organized in libraries that can be loaded per use case, on demand. Essentially, the libraries are loaded only once into memory and when a program instance requires a subroutine from a specific library. In this case, it inquires a special OS component about it and new resources are allocated only for the volatile parts of the library image (.bss
and .data
).
Let's look through hello.o
similarly to how we previously looked through hello
. What is different?
readelf -h
): the file doesn't have an entry point and the ELF type is specified as “Relocatable file”.readelf -S
): they look very similar to the one we inspected previously? What is missing? Any idea why?
Additionally, object files have a relocation table, i.e. a list of all the symbols that are external to the file. Let's look at hello.o
:
$ readelf -r hello.o Relocation section '.rel.text' at offset 0x19c contains 2 entries: Offset Info Type Sym.Value Sym. Name 00000015 00000501 R_386_32 00000000 .rodata 0000001a 00000a02 R_386_PC32 00000000 puts ...
We notice that one of the external symbols is puts
. Since that is part of the C library, the linker must resolve its location and replace all occurences with the symbol's address.
As discussed in previous labs, we can disassemble ELF executable files on almost any Linux system using objdump
with the -d
or the -D
flag:
$ objdump -D -M intel hello hello: file format elf32-i386 ... Disassembly of section .init: 080482a8 <_init>: 80482a8: 53 push ebx 80482a9: 83 ec 08 sub esp,0x8 80482ac: e8 8f 00 00 00 call 8048340 <__x86.get_pc_thunk.bx> 80482b1: 81 c3 4f 1d 00 00 add ebx,0x1d4f 80482b7: 8b 83 fc ff ff ff mov eax,DWORD PTR [ebx-0x4] 80482bd: 85 c0 test eax,eax 80482bf: 74 05 je 80482c6 <_init+0x1e> 80482c1: e8 3a 00 00 00 call 8048300 <__libc_start_main@plt+0x10> 80482c6: 83 c4 08 add esp,0x8 80482c9: 5b pop ebx 80482ca: c3 ret
-d
and -D
? What does -M
do? In general we encourage you to check out the manpages to find out.
Sometimes however it is possible that the code we are dealing with doesn't have any useful metadata associated with it, e.g. it comes in a raw (flat) binary form, the executable format is not recognized or the ELF header is corrupted. Let's take for example the hello2
binary generated from hello2.S
in the lab archive:
$ objdump -D hello2 objdump: hello2: File format not recognized $ file hello2 hello2: data
We can force objdump
to attempt disassembling raw files by passing the -b
flag. In this case however, objdump
does not assume any target architecture, so we must pass it explicitly using -m
. For example:
$ objdump -D -b binary -m i386 -M intel hello2 hello2: file format binary Disassembly of section .data: 00000000 <.data>: 0: 66 ba 0e 00 mov dx,0xe 4: 00 00 add BYTE PTR [eax],al 6: 66 b9 24 00 mov cx,0x24 a: 00 00 add BYTE PTR [eax],al c: 66 bb 01 00 mov bx,0x1 10: 00 00 add BYTE PTR [eax],al 12: 66 b8 04 00 mov ax,0x4 16: 00 00 add BYTE PTR [eax],al 18: cd 80 int 0x80 1a: 66 b8 01 00 mov ax,0x1 1e: 00 00 add BYTE PTR [eax],al 20: cd 80 int 0x80 22: 00 00 add BYTE PTR [eax],al 24: 48 dec eax 25: 65 6c gs ins BYTE PTR es:[edi],dx 27: 6c ins BYTE PTR es:[edi],dx 28: 6f outs dx,DWORD PTR ds:[esi] 29: 2c 20 sub al,0x20 2b: 77 6f ja 0x9c 2d: 72 6c jb 0x9b 2f: 64 21 0a and DWORD PTR fs:[edx],ecx
Looking back at the hello2.S
source file, we notice that the disassembled code maps almost directly. The last part of the binary does not contain any meaningful code, because here objdump
attempts to also disassemble data.
To obtain raw data we can just dump the binary using hexdump
or xxd
:
$ xxd hello2 00000000: 66ba 0e00 0000 66b9 2400 0000 66bb 0100 f.....f.$...f... 00000010: 0000 66b8 0400 0000 cd80 66b8 0100 0000 ..f.......f..... 00000020: cd80 0000 4865 6c6c 6f2c 2077 6f72 6c64 ....Hello, world 00000030: 210a !.
The purpose of this task is to get you acquainted with some tools that can be used to manipulate ELF files.
shellcode.c
contains a buffer SC, that has raw instructions
readelf -s ./shellcode | grep SC
$ readelf -S ./shellcode [24] .data PROGBITS 0000000000601020 0000000000000058 0000000000000000 WA 0 0 32
objcopy --set-section-flags .data=alloc,code,load ./shellcode
Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame 03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
LOAD 0x0000000000000e28 0x0000000000600e28 0x0000000000600e28 0x0000000000000250 0x0000000000000260 RW 200000
execstack -s ./shellcode
), but this works because the stack is near to the data, due to alignment [citation needed].gcc -O0 -o shellcode shellcode.c
./shellcode generate > mycode.bin
$ file ./mycode.bin ./mycode.bin: data
file
working? Is it a false positive?chmod +x ./mycode.bin && ./mycode.bin
The problem so far is that the shellcode (SC) ends in a segment that does not have the executable bit set. One solution to this is, at runtime, remap the segment (page) with the exec flag – this solution requires writing some code. We can focus on another solution: use tools and .ELF's capability:
objcopy -I binary -O elf64-x86-64 ./mycode.bin ./mycode.bin.o
objcopy -I elf64-x86-64 --set-section-flags .data=alloc,code,load ./mycode.bin.o
elfedit --output-mach x86-64 ./mycode.bin.o
.data
section!$ readelf -s ./mycode.bin.o 0000000000000035 D _binary___mycode_bin_end 0000000000000035 A _binary___mycode_bin_size 0000000000000000 D _binary___mycode_bin_start
gcc -O0 use-my-code.c ./mycode.bin.o -o my
execstack -c ./my
execstack -c ./*.o
throw an error?execstack
has to have information about the segments, information which is only available after the linking processreadelf -e | grep .data
, check for segment 03 (which maps the .data
section)
func
causes a Segmentation Fault
, it's likely that your system produces PIE executables by default. Modify the “Compile and link!” command to:
gcc -no-pie -O0 use-my-code.c ./mycode.bin.o -o my
Someone has given us a stripped binary called stripped
. Let's run it and give it a brief view:
$ ./stripped Hello, there! I am looping, looping, looping, looping, looping, $ file ./stripped ./stripped: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped
The executable file is stripped, so we can't rely on any symbol information to look at it. However, it's small enough, so we can try to reverse engineer it by hand. To do that, answer the following questions:
call
instruction receive during execution?ret
instructions placed relative to the call
operands?call
and ret
?
objdump
using the -s
flag. Use this to figure out what pointers to contents from .data
are put into registers.
Looking more carefully at our stripped
binary, we notice that there is one string that it never prints out:
strings -t x stripped 10c Hello, there! 11a I am looping, 129 All done! 134 .shstrtab 13e .text 144 .data
The string All done!
is at offset 0x129
in the binary, that is equivalent to 0x8049129
in the loaded program.
$ objdump -D stripped -M intel | grep -A 2 -B 2 8049129 8048080: c3 ret 8048081: ba 0a 00 00 00 mov edx,0xa 8048086: b9 29 91 04 08 mov ecx,0x8049129 804808b: e8 61 00 00 00 call 0x80480f1 8048090: c3 ret
This means that the function that does the print (0x8048081
) is never reached! Why? The reason is that the program exits before doing that.
Find the call to the exit function that occurs at run-time exactly before this print and manually replace it with NOP instructions using the hex editor of your choice. At the end the program should display the following:
./stripped Hello, there! I am looping, looping, looping, looping, looping, All done!
Note that the program should still exit cleanly!
0x90
, so just replace all the bytes of the offending call
instruction with that.
Using your newfound voodoo skills you are now able to tackle the following task. In the middle of two programs I added the following lines:
{ int i; int *a[1]; for( i = 0 ; i < 20; i++) printf("%p\n", a[i]); }
The results were the following. respectively:
0x804853b 0x1 0x8048530 (nil) (nil) 0xf7e0ace5 0x1 0xffffce64 0xffffce6c 0xf7ffcfc0 0x1c (nil) 0xf7fda4c8 0x2 0xffffce60 0xf7f94e54 (nil) (nil) (nil) 0xd545cf8d
and
0xbfffe7d0 0xd696910 0x80484a9 0xb7fffbe8 0x3 0xb7ffefc0 0xb7df6a84 0x1 0xb7fdc780 0xb7fe75fc 0x804c008 0xb7e59195 0x804c008 0xb7fdb000 0xb7fdc000 0x1 0xffffffff 0x3 (nil) 0xf3b9a5b
Answer these questions:
The change-header
directory contains a file named main.bad
.
main.bad
as reported by file
command?unscramble.py
please fix the elf header!readelf -h ./main.ok
should not complain at all. symbol.map
and further extending unscramble.py
, try to directly call the main
and call_me
function.main
function directly? Why?call_me
function directly? Why?main.ok.main
, main.ok.call_me
, main.ok.real_main
. readelf -h main.ok*
should not complain.